{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "provenance": [],
      "authorship_tag": "ABX9TyN9IzRpcqeueXesGrMNrZ0r",
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/github/kaixindelele/ChatPaper/blob/main/ChatPaper.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 2,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "aSNXwIULm9GZ",
        "outputId": "aaf89fa1-2bfd-4fa1-9107-01073e8729a5"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
            "Requirement already satisfied: arxiv in /usr/local/lib/python3.9/dist-packages (1.4.3)\n",
            "Requirement already satisfied: PyMuPDF in /usr/local/lib/python3.9/dist-packages (1.21.1)\n",
            "Requirement already satisfied: requests in /usr/local/lib/python3.9/dist-packages (2.28.2)\n",
            "Requirement already satisfied: tiktoken in /usr/local/lib/python3.9/dist-packages (0.3.1)\n",
            "Requirement already satisfied: tenacity in /usr/local/lib/python3.9/dist-packages (8.2.2)\n",
            "Requirement already satisfied: pybase64 in /usr/local/lib/python3.9/dist-packages (1.2.3)\n",
            "Requirement already satisfied: Pillow in /usr/local/lib/python3.9/dist-packages (8.4.0)\n",
            "Requirement already satisfied: openai in /usr/local/lib/python3.9/dist-packages (0.27.2)\n",
            "Requirement already satisfied: markdown in /usr/local/lib/python3.9/dist-packages (3.4.1)\n",
            "Requirement already satisfied: gradio in /usr/local/lib/python3.9/dist-packages (3.21.0)\n",
            "Requirement already satisfied: feedparser in /usr/local/lib/python3.9/dist-packages (from arxiv) (6.0.10)\n",
            "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.9/dist-packages (from requests) (2022.12.7)\n",
            "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.9/dist-packages (from requests) (2.10)\n",
            "Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.9/dist-packages (from requests) (1.26.15)\n",
            "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.9/dist-packages (from requests) (3.1.0)\n",
            "Requirement already satisfied: regex>=2022.1.18 in /usr/local/lib/python3.9/dist-packages (from tiktoken) (2022.6.2)\n",
            "Requirement already satisfied: aiohttp in /usr/local/lib/python3.9/dist-packages (from openai) (3.8.4)\n",
            "Requirement already satisfied: tqdm in /usr/local/lib/python3.9/dist-packages (from openai) (4.65.0)\n",
            "Requirement already satisfied: importlib-metadata>=4.4 in /usr/local/lib/python3.9/dist-packages (from markdown) (6.0.0)\n",
            "Requirement already satisfied: numpy in /usr/local/lib/python3.9/dist-packages (from gradio) (1.22.4)\n",
            "Requirement already satisfied: mdit-py-plugins<=0.3.3 in /usr/local/lib/python3.9/dist-packages (from gradio) (0.3.3)\n",
            "Requirement already satisfied: orjson in /usr/local/lib/python3.9/dist-packages (from gradio) (3.8.7)\n",
            "Requirement already satisfied: httpx in /usr/local/lib/python3.9/dist-packages (from gradio) (0.23.3)\n",
            "Requirement already satisfied: markupsafe in /usr/local/lib/python3.9/dist-packages (from gradio) (2.1.2)\n",
            "Requirement already satisfied: matplotlib in /usr/local/lib/python3.9/dist-packages (from gradio) (3.5.3)\n",
            "Requirement already satisfied: pydantic in /usr/local/lib/python3.9/dist-packages (from gradio) (1.10.6)\n",
            "Requirement already satisfied: typing-extensions in /usr/local/lib/python3.9/dist-packages (from gradio) (4.5.0)\n",
            "Requirement already satisfied: pydub in /usr/local/lib/python3.9/dist-packages (from gradio) (0.25.1)\n",
            "Requirement already satisfied: aiofiles in /usr/local/lib/python3.9/dist-packages (from gradio) (23.1.0)\n",
            "Requirement already satisfied: uvicorn in /usr/local/lib/python3.9/dist-packages (from gradio) (0.21.0)\n",
            "Requirement already satisfied: huggingface-hub>=0.13.0 in /usr/local/lib/python3.9/dist-packages (from gradio) (0.13.2)\n",
            "Requirement already satisfied: fsspec in /usr/local/lib/python3.9/dist-packages (from gradio) (2023.3.0)\n",
            "Requirement already satisfied: pyyaml in /usr/local/lib/python3.9/dist-packages (from gradio) (6.0)\n",
            "Requirement already satisfied: python-multipart in /usr/local/lib/python3.9/dist-packages (from gradio) (0.0.6)\n",
            "Requirement already satisfied: websockets>=10.0 in /usr/local/lib/python3.9/dist-packages (from gradio) (10.4)\n",
            "Requirement already satisfied: altair>=4.2.0 in /usr/local/lib/python3.9/dist-packages (from gradio) (4.2.2)\n",
            "Requirement already satisfied: jinja2 in /usr/local/lib/python3.9/dist-packages (from gradio) (3.1.2)\n",
            "Requirement already satisfied: markdown-it-py[linkify]>=2.0.0 in /usr/local/lib/python3.9/dist-packages (from gradio) (2.2.0)\n",
            "Requirement already satisfied: fastapi in /usr/local/lib/python3.9/dist-packages (from gradio) (0.94.1)\n",
            "Requirement already satisfied: pandas in /usr/local/lib/python3.9/dist-packages (from gradio) (1.4.4)\n",
            "Requirement already satisfied: ffmpy in /usr/local/lib/python3.9/dist-packages (from gradio) (0.3.0)\n",
            "Requirement already satisfied: toolz in /usr/local/lib/python3.9/dist-packages (from altair>=4.2.0->gradio) (0.12.0)\n",
            "Requirement already satisfied: jsonschema>=3.0 in /usr/local/lib/python3.9/dist-packages (from altair>=4.2.0->gradio) (4.3.3)\n",
            "Requirement already satisfied: entrypoints in /usr/local/lib/python3.9/dist-packages (from altair>=4.2.0->gradio) (0.4)\n",
            "Requirement already satisfied: filelock in /usr/local/lib/python3.9/dist-packages (from huggingface-hub>=0.13.0->gradio) (3.9.0)\n",
            "Requirement already satisfied: packaging>=20.9 in /usr/local/lib/python3.9/dist-packages (from huggingface-hub>=0.13.0->gradio) (23.0)\n",
            "Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.9/dist-packages (from importlib-metadata>=4.4->markdown) (3.15.0)\n",
            "Requirement already satisfied: mdurl~=0.1 in /usr/local/lib/python3.9/dist-packages (from markdown-it-py[linkify]>=2.0.0->gradio) (0.1.2)\n",
            "Requirement already satisfied: linkify-it-py<3,>=1 in /usr/local/lib/python3.9/dist-packages (from markdown-it-py[linkify]>=2.0.0->gradio) (2.0.0)\n",
            "Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.9/dist-packages (from pandas->gradio) (2022.7.1)\n",
            "Requirement already satisfied: python-dateutil>=2.8.1 in /usr/local/lib/python3.9/dist-packages (from pandas->gradio) (2.8.2)\n",
            "Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.9/dist-packages (from aiohttp->openai) (6.0.4)\n",
            "Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in /usr/local/lib/python3.9/dist-packages (from aiohttp->openai) (4.0.2)\n",
            "Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.9/dist-packages (from aiohttp->openai) (1.3.1)\n",
            "Requirement already satisfied: yarl<2.0,>=1.0 in /usr/local/lib/python3.9/dist-packages (from aiohttp->openai) (1.8.2)\n",
            "Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.9/dist-packages (from aiohttp->openai) (1.3.3)\n",
            "Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.9/dist-packages (from aiohttp->openai) (22.2.0)\n",
            "Requirement already satisfied: starlette<0.27.0,>=0.26.1 in /usr/local/lib/python3.9/dist-packages (from fastapi->gradio) (0.26.1)\n",
            "Requirement already satisfied: sgmllib3k in /usr/local/lib/python3.9/dist-packages (from feedparser->arxiv) (1.0.0)\n",
            "Requirement already satisfied: sniffio in /usr/local/lib/python3.9/dist-packages (from httpx->gradio) (1.3.0)\n",
            "Requirement already satisfied: httpcore<0.17.0,>=0.15.0 in /usr/local/lib/python3.9/dist-packages (from httpx->gradio) (0.16.3)\n",
            "Requirement already satisfied: rfc3986[idna2008]<2,>=1.3 in /usr/local/lib/python3.9/dist-packages (from httpx->gradio) (1.5.0)\n",
            "Requirement already satisfied: pyparsing>=2.2.1 in /usr/local/lib/python3.9/dist-packages (from matplotlib->gradio) (3.0.9)\n",
            "Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.9/dist-packages (from matplotlib->gradio) (4.39.0)\n",
            "Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.9/dist-packages (from matplotlib->gradio) (0.11.0)\n",
            "Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.9/dist-packages (from matplotlib->gradio) (1.4.4)\n",
            "Requirement already satisfied: click>=7.0 in /usr/local/lib/python3.9/dist-packages (from uvicorn->gradio) (8.1.3)\n",
            "Requirement already satisfied: h11>=0.8 in /usr/local/lib/python3.9/dist-packages (from uvicorn->gradio) (0.14.0)\n",
            "Requirement already satisfied: anyio<5.0,>=3.0 in /usr/local/lib/python3.9/dist-packages (from httpcore<0.17.0,>=0.15.0->httpx->gradio) (3.6.2)\n",
            "Requirement already satisfied: pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 in /usr/local/lib/python3.9/dist-packages (from jsonschema>=3.0->altair>=4.2.0->gradio) (0.19.3)\n",
            "Requirement already satisfied: uc-micro-py in /usr/local/lib/python3.9/dist-packages (from linkify-it-py<3,>=1->markdown-it-py[linkify]>=2.0.0->gradio) (1.0.1)\n",
            "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.9/dist-packages (from python-dateutil>=2.8.1->pandas->gradio) (1.15.0)\n"
          ]
        }
      ],
      "source": [
        "!pip install arxiv PyMuPDF requests tiktoken tenacity pybase64 Pillow openai markdown gradio"
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "from IPython.core.interactiveshell import InteractiveShell\n",
        "import numpy as np\n",
        "import os\n",
        "import re\n",
        "import datetime\n",
        "import arxiv\n",
        "import openai, tenacity\n",
        "import base64, requests\n",
        "import argparse\n",
        "import configparser\n",
        "import json\n",
        "import tiktoken\n",
        "import fitz, io, os\n",
        "from PIL import Image\n",
        "\n",
        "\n",
        "class Paper:\n",
        "    def __init__(self, path, title='', url='', abs='', authers=[]):\n",
        "        # 初始化函数，根据pdf路径初始化Paper对象                \n",
        "        self.url =  url           # 文章链接\n",
        "        self.path = path          # pdf路径\n",
        "        self.section_names = []   # 段落标题\n",
        "        self.section_texts = {}   # 段落内容    \n",
        "        self.abs = abs\n",
        "        self.title_page = 0\n",
        "        if title == '':\n",
        "            self.pdf = fitz.open(self.path) # pdf文档\n",
        "            self.title = self.get_title()\n",
        "            self.parse_pdf()            \n",
        "        else:\n",
        "            self.title = title\n",
        "        self.authers = authers        \n",
        "        self.roman_num = [\"I\", \"II\", 'III', \"IV\", \"V\", \"VI\", \"VII\", \"VIII\", \"IIX\", \"IX\", \"X\"]\n",
        "        self.digit_num = [str(d+1) for d in range(10)]\n",
        "        self.first_image = ''\n",
        "        \n",
        "    def parse_pdf(self):\n",
        "        self.pdf = fitz.open(self.path) # pdf文档\n",
        "        self.text_list = [page.get_text() for page in self.pdf]\n",
        "        self.all_text = ' '.join(self.text_list)\n",
        "        self.section_page_dict = self._get_all_page_index() # 段落与页码的对应字典\n",
        "        print(\"section_page_dict\", self.section_page_dict)\n",
        "        self.section_text_dict = self._get_all_page() # 段落与内容的对应字典\n",
        "        self.section_text_dict.update({\"title\": self.title})\n",
        "        self.section_text_dict.update({\"paper_info\": self.get_paper_info()})\n",
        "        self.pdf.close()    \n",
        "\n",
        "    def get_paper_info(self):\n",
        "        first_page_text = self.pdf[self.title_page].get_text()\n",
        "        if \"Abstract\" in self.section_text_dict.keys():\n",
        "            abstract_text = self.section_text_dict['Abstract']\n",
        "        else:\n",
        "            abstract_text = self.abs\n",
        "        introduction_text = self.section_text_dict['Introduction']\n",
        "        first_page_text = first_page_text.replace(abstract_text, \"\").replace(introduction_text, \"\")\n",
        "        return first_page_text\n",
        "        \n",
        "    def get_image_path(self, image_path=''):\n",
        "        \"\"\"\n",
        "        将PDF中的第一张图保存到image.png里面，存到本地目录，返回文件名称，供gitee读取\n",
        "        :param filename: 图片所在路径，\"C:\\\\Users\\\\Administrator\\\\Desktop\\\\nwd.pdf\"\n",
        "        :param image_path: 图片提取后的保存路径\n",
        "        :return:\n",
        "        \"\"\"\n",
        "        # open file\n",
        "        max_size = 0\n",
        "        image_list = []\n",
        "        with fitz.Document(self.path) as my_pdf_file:\n",
        "            # 遍历所有页面\n",
        "            for page_number in range(1, len(my_pdf_file) + 1):\n",
        "                # 查看独立页面\n",
        "                page = my_pdf_file[page_number - 1]\n",
        "                # 查看当前页所有图片\n",
        "                images = page.get_images()                \n",
        "                # 遍历当前页面所有图片\n",
        "                for image_number, image in enumerate(page.get_images(), start=1):           \n",
        "                    # 访问图片xref\n",
        "                    xref_value = image[0]\n",
        "                    # 提取图片信息\n",
        "                    base_image = my_pdf_file.extract_image(xref_value)\n",
        "                    # 访问图片\n",
        "                    image_bytes = base_image[\"image\"]\n",
        "                    # 获取图片扩展名\n",
        "                    ext = base_image[\"ext\"]\n",
        "                    # 加载图片\n",
        "                    image = Image.open(io.BytesIO(image_bytes))\n",
        "                    image_size = image.size[0] * image.size[1]\n",
        "                    if image_size > max_size:\n",
        "                        max_size = image_size\n",
        "                    image_list.append(image)\n",
        "        for image in image_list:                            \n",
        "            image_size = image.size[0] * image.size[1]\n",
        "            if image_size == max_size:                \n",
        "                image_name = f\"image.{ext}\"\n",
        "                im_path = os.path.join(image_path, image_name)\n",
        "                print(\"im_path:\", im_path)\n",
        "                \n",
        "                max_pix = 480\n",
        "                origin_min_pix = min(image.size[0], image.size[1])\n",
        "                \n",
        "                if image.size[0] > image.size[1]:\n",
        "                    min_pix = int(image.size[1] * (max_pix/image.size[0]))\n",
        "                    newsize = (max_pix, min_pix)\n",
        "                else:\n",
        "                    min_pix = int(image.size[0] * (max_pix/image.size[1]))\n",
        "                    newsize = (min_pix, max_pix)\n",
        "                image = image.resize(newsize)\n",
        "                \n",
        "                image.save(open(im_path, \"wb\"))\n",
        "                return im_path, ext\n",
        "        return None, None\n",
        "    \n",
        "    # 定义一个函数，根据字体的大小，识别每个章节名称，并返回一个列表\n",
        "    def get_chapter_names(self,):\n",
        "        # # 打开一个pdf文件\n",
        "        doc = fitz.open(self.path) # pdf文档        \n",
        "        text_list = [page.get_text() for page in doc]\n",
        "        all_text = ''\n",
        "        for text in text_list:\n",
        "            all_text += text\n",
        "        # # 创建一个空列表，用于存储章节名称\n",
        "        chapter_names = []\n",
        "        for line in all_text.split('\\n'):\n",
        "            line_list = line.split(' ')\n",
        "            if '.' in line:\n",
        "                point_split_list = line.split('.')\n",
        "                space_split_list = line.split(' ')\n",
        "                if 1 < len(space_split_list) < 5:\n",
        "                    if 1 < len(point_split_list) < 5 and (point_split_list[0] in self.roman_num or point_split_list[0] in self.digit_num):\n",
        "                        print(\"line:\", line)\n",
        "                        chapter_names.append(line)        \n",
        "        \n",
        "        return chapter_names\n",
        "        \n",
        "    def get_title(self):\n",
        "        doc = self.pdf # 打开pdf文件\n",
        "        max_font_size = 0 # 初始化最大字体大小为0\n",
        "        max_string = \"\" # 初始化最大字体大小对应的字符串为空\n",
        "        max_font_sizes = [0]\n",
        "        for page_index, page in enumerate(doc): # 遍历每一页\n",
        "            text = page.get_text(\"dict\") # 获取页面上的文本信息\n",
        "            blocks = text[\"blocks\"] # 获取文本块列表\n",
        "            for block in blocks: # 遍历每个文本块\n",
        "                if block[\"type\"] == 0 and len(block['lines']): # 如果是文字类型\n",
        "                    if len(block[\"lines\"][0][\"spans\"]):\n",
        "                        font_size = block[\"lines\"][0][\"spans\"][0][\"size\"] # 获取第一行第一段文字的字体大小            \n",
        "                        max_font_sizes.append(font_size)\n",
        "                        if font_size > max_font_size: # 如果字体大小大于当前最大值\n",
        "                            max_font_size = font_size # 更新最大值\n",
        "                            max_string = block[\"lines\"][0][\"spans\"][0][\"text\"] # 更新最大值对应的字符串\n",
        "        max_font_sizes.sort()                \n",
        "        print(\"max_font_sizes\", max_font_sizes[-10:])\n",
        "        cur_title = ''\n",
        "        for page_index, page in enumerate(doc): # 遍历每一页\n",
        "            text = page.get_text(\"dict\") # 获取页面上的文本信息\n",
        "            blocks = text[\"blocks\"] # 获取文本块列表\n",
        "            for block in blocks: # 遍历每个文本块\n",
        "                if block[\"type\"] == 0 and len(block['lines']): # 如果是文字类型\n",
        "                    if len(block[\"lines\"][0][\"spans\"]):\n",
        "                        cur_string = block[\"lines\"][0][\"spans\"][0][\"text\"] # 更新最大值对应的字符串\n",
        "                        font_flags = block[\"lines\"][0][\"spans\"][0][\"flags\"] # 获取第一行第一段文字的字体特征\n",
        "                        font_size = block[\"lines\"][0][\"spans\"][0][\"size\"] # 获取第一行第一段文字的字体大小                         \n",
        "                        # print(font_size)\n",
        "                        if abs(font_size - max_font_sizes[-1]) < 0.3 or abs(font_size - max_font_sizes[-2]) < 0.3:                        \n",
        "                            # print(\"The string is bold.\", max_string, \"font_size:\", font_size, \"font_flags:\", font_flags)                            \n",
        "                            if len(cur_string) > 4 and \"arXiv\" not in cur_string:                            \n",
        "                                # print(\"The string is bold.\", max_string, \"font_size:\", font_size, \"font_flags:\", font_flags) \n",
        "                                if cur_title == ''    :\n",
        "                                    cur_title += cur_string                       \n",
        "                                else:\n",
        "                                    cur_title += ' ' + cur_string                       \n",
        "                            # break\n",
        "                            self.title_page = page_index\n",
        "        title = cur_title.replace('\\n', ' ')                        \n",
        "        return title\n",
        "\n",
        "\n",
        "    def _get_all_page_index(self):\n",
        "        # 定义需要寻找的章节名称列表\n",
        "        section_list = [\"Abstract\", \n",
        "                'Introduction', 'Related Work', 'Background', \n",
        "                \"Preliminary\", \"Problem Formulation\",\n",
        "                'Methods', 'Methodology', \"Method\", 'Approach', 'Approaches',\n",
        "                # exp\n",
        "                \"Materials and Methods\", \"Experiment Settings\",\n",
        "                'Experiment',  \"Experimental Results\", \"Evaluation\", \"Experiments\",                        \n",
        "                \"Results\", 'Findings', 'Data Analysis',                                                                        \n",
        "                \"Discussion\", \"Results and Discussion\", \"Conclusion\",\n",
        "                'References']\n",
        "        # 初始化一个字典来存储找到的章节和它们在文档中出现的页码\n",
        "        section_page_dict = {}\n",
        "        # 遍历每一页文档\n",
        "        for page_index, page in enumerate(self.pdf):\n",
        "            # 获取当前页面的文本内容\n",
        "            cur_text = page.get_text()\n",
        "            # 遍历需要寻找的章节名称列表\n",
        "            for section_name in section_list:\n",
        "                # 将章节名称转换成大写形式\n",
        "                section_name_upper = section_name.upper()\n",
        "                # 如果当前页面包含\"Abstract\"这个关键词\n",
        "                if \"Abstract\" == section_name and section_name in cur_text:\n",
        "                    # 将\"Abstract\"和它所在的页码加入字典中\n",
        "                    section_page_dict[section_name] = page_index\n",
        "                # 如果当前页面包含章节名称，则将章节名称和它所在的页码加入字典中\n",
        "                else:\n",
        "                    if section_name + '\\n' in cur_text:\n",
        "                        section_page_dict[section_name] = page_index\n",
        "                    elif section_name_upper + '\\n' in cur_text:\n",
        "                        section_page_dict[section_name] = page_index\n",
        "        # 返回所有找到的章节名称及它们在文档中出现的页码\n",
        "        return section_page_dict\n",
        "\n",
        "    def _get_all_page(self):\n",
        "        \"\"\"\n",
        "        获取PDF文件中每个页面的文本信息，并将文本信息按照章节组织成字典返回。\n",
        "\n",
        "        Returns:\n",
        "            section_dict (dict): 每个章节的文本信息字典，key为章节名，value为章节文本。\n",
        "        \"\"\"\n",
        "        text = ''\n",
        "        text_list = []\n",
        "        section_dict = {}\n",
        "        \n",
        "        # 再处理其他章节：\n",
        "        text_list = [page.get_text() for page in self.pdf]\n",
        "        for sec_index, sec_name in enumerate(self.section_page_dict):\n",
        "            print(sec_index, sec_name, self.section_page_dict[sec_name])\n",
        "            if sec_index <= 0 and self.abs:\n",
        "                continue\n",
        "            else:\n",
        "                # 直接考虑后面的内容：\n",
        "                start_page = self.section_page_dict[sec_name]\n",
        "                if sec_index < len(list(self.section_page_dict.keys()))-1:\n",
        "                    end_page = self.section_page_dict[list(self.section_page_dict.keys())[sec_index+1]]\n",
        "                else:\n",
        "                    end_page = len(text_list)\n",
        "                print(\"start_page, end_page:\", start_page, end_page)\n",
        "                cur_sec_text = ''\n",
        "                if end_page - start_page == 0:\n",
        "                    if sec_index < len(list(self.section_page_dict.keys()))-1:\n",
        "                        next_sec = list(self.section_page_dict.keys())[sec_index+1]\n",
        "                        if text_list[start_page].find(sec_name) == -1:\n",
        "                            start_i = text_list[start_page].find(sec_name.upper())\n",
        "                        else:\n",
        "                            start_i = text_list[start_page].find(sec_name)\n",
        "                        if text_list[start_page].find(next_sec) == -1:\n",
        "                            end_i = text_list[start_page].find(next_sec.upper())\n",
        "                        else:\n",
        "                            end_i = text_list[start_page].find(next_sec)                        \n",
        "                        cur_sec_text += text_list[start_page][start_i:end_i]\n",
        "                else:\n",
        "                    for page_i in range(start_page, end_page):                    \n",
        "#                         print(\"page_i:\", page_i)\n",
        "                        if page_i == start_page:\n",
        "                            if text_list[start_page].find(sec_name) == -1:\n",
        "                                start_i = text_list[start_page].find(sec_name.upper())\n",
        "                            else:\n",
        "                                start_i = text_list[start_page].find(sec_name)\n",
        "                            cur_sec_text += text_list[page_i][start_i:]\n",
        "                        elif page_i < end_page:\n",
        "                            cur_sec_text += text_list[page_i]\n",
        "                        elif page_i == end_page:\n",
        "                            if sec_index < len(list(self.section_page_dict.keys()))-1:\n",
        "                                next_sec = list(self.section_page_dict.keys())[sec_index+1]\n",
        "                                if text_list[start_page].find(next_sec) == -1:\n",
        "                                    end_i = text_list[start_page].find(next_sec.upper())\n",
        "                                else:\n",
        "                                    end_i = text_list[start_page].find(next_sec)  \n",
        "                                cur_sec_text += text_list[page_i][:end_i]\n",
        "                section_dict[sec_name] = cur_sec_text.replace('-\\n', '').replace('\\n', ' ')\n",
        "        return section_dict\n",
        "                \n",
        "\n",
        "# 定义Reader类\n",
        "class Reader:\n",
        "    # 初始化方法，设置属性\n",
        "    def __init__(self, key_word, query, filter_keys, \n",
        "                 root_path='./',\n",
        "                 gitee_key='',\n",
        "                 sort=arxiv.SortCriterion.SubmittedDate, user_name='defualt', args=None):\n",
        "        self.user_name = user_name # 读者姓名\n",
        "        self.key_word = key_word # 读者感兴趣的关键词\n",
        "        self.query = query # 读者输入的搜索查询\n",
        "        self.sort = sort # 读者选择的排序方式\n",
        "        if args.language == 'en':\n",
        "            self.language = 'English'\n",
        "        elif args.language == 'zh':\n",
        "            self.language = 'Chinese'\n",
        "        else:\n",
        "            self.language = 'Chinese'        \n",
        "        self.filter_keys = filter_keys # 用于在摘要中筛选的关键词\n",
        "        self.root_path = root_path\n",
        "        \n",
        "        self.chat_api_list = [args.api_key]\n",
        "        self.cur_api = 0\n",
        "        self.file_format = args.file_format        \n",
        "        \n",
        "        self.max_token_num = 4096\n",
        "        self.encoding = tiktoken.get_encoding(\"gpt2\")\n",
        "                \n",
        "    def get_arxiv(self, max_results=30):\n",
        "        search = arxiv.Search(query=self.query,\n",
        "                              max_results=max_results,                              \n",
        "                              sort_by=self.sort,\n",
        "                              sort_order=arxiv.SortOrder.Descending,\n",
        "                              )       \n",
        "        return search\n",
        "     \n",
        "    def filter_arxiv(self, max_results=30):\n",
        "        search = self.get_arxiv(max_results=max_results)\n",
        "        print(\"all search:\")\n",
        "        for index, result in enumerate(search.results()):\n",
        "            print(index, result.title, result.updated)\n",
        "            # 把摘要也打印一下，看看到底是什么妖魔鬼怪！\n",
        "            print(\"abs_text:\", result.summary.replace('-\\n', '-').replace('\\n', ' '))\n",
        "            print(\"-\"*30)\n",
        "            \n",
        "        filter_results = []   \n",
        "        filter_keys = self.filter_keys\n",
        "        \n",
        "        print(\"filter_keys:\", self.filter_keys)\n",
        "        # 确保每个关键词都能在摘要中找到，才算是目标论文\n",
        "        for index, result in enumerate(search.results()):\n",
        "            abs_text = result.summary.replace('-\\n', '-').replace('\\n', ' ')\n",
        "            meet_num = 0\n",
        "            for f_key in filter_keys.split(\" \"):\n",
        "                if f_key.lower() in abs_text.lower():\n",
        "                    meet_num += 1\n",
        "            if meet_num == len(filter_keys.split(\" \")):\n",
        "                filter_results.append(result)\n",
        "                # break\n",
        "        print(\"筛选后剩下的论文数量：\")\n",
        "        print(\"filter_results:\", len(filter_results))\n",
        "        print(\"filter_papers:\")\n",
        "        for index, result in enumerate(filter_results):\n",
        "            print(index, result.title, result.updated)\n",
        "        return filter_results\n",
        "    \n",
        "    def validateTitle(self, title):\n",
        "        # 将论文的乱七八糟的路径格式修正\n",
        "        rstr = r\"[\\/\\\\\\:\\*\\?\\\"\\<\\>\\|]\" # '/ \\ : * ? \" < > |'\n",
        "        new_title = re.sub(rstr, \"_\", title) # 替换为下划线\n",
        "        return new_title\n",
        "\n",
        "    def download_pdf(self, filter_results):\n",
        "        # 先创建文件夹\n",
        "        date_str = str(datetime.datetime.now())[:13].replace(' ', '-')        \n",
        "        key_word = str(self.key_word.replace(':', ' '))        \n",
        "        path = self.root_path  + 'pdf_files/' + self.query.replace('au: ', '').replace('title: ', '').replace('ti: ', '').replace(':', ' ')[:25] + '-' + date_str\n",
        "        try:\n",
        "            os.makedirs(path)\n",
        "        except:\n",
        "            pass\n",
        "        print(\"All_paper:\", len(filter_results))\n",
        "        # 开始下载：\n",
        "        paper_list = []\n",
        "        for r_index, result in enumerate(filter_results):\n",
        "            try:\n",
        "                title_str = self.validateTitle(result.title)\n",
        "                pdf_name = title_str+'.pdf'\n",
        "                # result.download_pdf(path, filename=pdf_name)\n",
        "                self.try_download_pdf(result, path, pdf_name)\n",
        "                paper_path = os.path.join(path, pdf_name)\n",
        "                print(\"paper_path:\", paper_path)\n",
        "                paper = Paper(path=paper_path,\n",
        "                              url=result.entry_id,\n",
        "                              title=result.title,\n",
        "                              abs=result.summary.replace('-\\n', '-').replace('\\n', ' '),\n",
        "                              authers=[str(aut) for aut in result.authors],\n",
        "                              )\n",
        "                # 下载完毕，开始解析：\n",
        "                paper.parse_pdf()\n",
        "                paper_list.append(paper)\n",
        "            except Exception as e:\n",
        "                print(\"download_error:\", e)\n",
        "                pass\n",
        "        return paper_list\n",
        "    \n",
        "    @tenacity.retry(wait=tenacity.wait_exponential(multiplier=1, min=4, max=10),\n",
        "                    stop=tenacity.stop_after_attempt(5),\n",
        "                    reraise=True)\n",
        "    def try_download_pdf(self, result, path, pdf_name):\n",
        "        result.download_pdf(path, filename=pdf_name)\n",
        "        \n",
        "    def summary_with_chat(self, paper_list):\n",
        "        htmls = []\n",
        "        for paper_index, paper in enumerate(paper_list):\n",
        "            # 第一步先用title，abs，和introduction进行总结。\n",
        "            text = ''\n",
        "            text += 'Title:' + paper.title\n",
        "            text += 'Url:' + paper.url\n",
        "            text += 'Abstrat:' + paper.abs\n",
        "            text += 'Paper_info:' + paper.section_text_dict['paper_info']\n",
        "            # intro\n",
        "            text += list(paper.section_text_dict.values())[0]\n",
        "            \n",
        "            chat_summary_text = self.chat_summary(text=text)            \n",
        "            htmls.append('## Paper:' + str(paper_index+1))\n",
        "            htmls.append('\\n\\n\\n')            \n",
        "            htmls.append(chat_summary_text)\n",
        "            \n",
        "            # 第二步总结方法：\n",
        "            # TODO，由于有些文章的方法章节名是算法名，所以简单的通过关键词来筛选，很难获取，后面需要用其他的方案去优化。\n",
        "            method_key = ''\n",
        "            for parse_key in paper.section_text_dict.keys():\n",
        "                if 'method' in parse_key.lower() or 'approach' in parse_key.lower():\n",
        "                    method_key = parse_key\n",
        "                    break\n",
        "                \n",
        "            if method_key != '':\n",
        "                text = ''\n",
        "                method_text = ''\n",
        "                summary_text = ''\n",
        "                summary_text += \"<summary>\" + chat_summary_text\n",
        "                # methods                \n",
        "                method_text += paper.section_text_dict[method_key]                   \n",
        "                text = summary_text + \"\\n\\n<Methods>:\\n\\n\" + method_text                 \n",
        "                chat_method_text = self.chat_method(text=text)\n",
        "                htmls.append(chat_method_text)\n",
        "            else:\n",
        "                chat_method_text = ''\n",
        "            htmls.append(\"\\n\"*4)\n",
        "            \n",
        "            # 第三步总结全文，并打分：\n",
        "            conclusion_key = ''\n",
        "            for parse_key in paper.section_text_dict.keys():\n",
        "                if 'conclu' in parse_key.lower():\n",
        "                    conclusion_key = parse_key\n",
        "                    break\n",
        "            \n",
        "            text = ''\n",
        "            conclusion_text = ''\n",
        "            summary_text = ''\n",
        "            summary_text += \"<summary>\" + chat_summary_text + \"\\n <Method summary>:\\n\" + chat_method_text            \n",
        "            if conclusion_key != '':\n",
        "                # conclusion                \n",
        "                conclusion_text += paper.section_text_dict[conclusion_key]                                \n",
        "                text = summary_text + \"\\n\\n<Conclusion>:\\n\\n\" + conclusion_text \n",
        "            else:\n",
        "                text = summary_text            \n",
        "            chat_conclusion_text = self.chat_conclusion(text=text)\n",
        "            htmls.append(chat_conclusion_text)\n",
        "            htmls.append(\"\\n\"*4)\n",
        "            \n",
        "            # # 整合成一个文件，打包保存下来。\n",
        "            date_str = str(datetime.datetime.now())[:13].replace(' ', '-')\n",
        "            try:\n",
        "                export_path = os.path.join(self.root_path, 'export')\n",
        "                os.makedirs(export_path)\n",
        "            except:\n",
        "                pass                             \n",
        "            mode = 'w' if paper_index == 0 else 'a'\n",
        "            file_name = os.path.join(export_path, date_str+'-'+self.validateTitle(paper.title)+\".\"+self.file_format)\n",
        "            self.export_to_markdown(\"\\n\".join(htmls), file_name=file_name, mode=mode)\n",
        "            \n",
        "            # file_name = os.path.join(export_path, date_str+'-'+self.validateTitle(paper.title)+\".md\")\n",
        "            # self.export_to_markdown(\"\\n\".join(htmls), file_name=file_name, mode=mode)\n",
        "            htmls = []\n",
        "    \n",
        "    @tenacity.retry(wait=tenacity.wait_exponential(multiplier=1, min=4, max=10),\n",
        "                    stop=tenacity.stop_after_attempt(5),\n",
        "                    reraise=True)\n",
        "    def chat_conclusion(self, text):\n",
        "        openai.api_key = self.chat_api_list[self.cur_api]\n",
        "        self.cur_api += 1\n",
        "        self.cur_api = 0 if self.cur_api >= len(self.chat_api_list)-1 else self.cur_api\n",
        "        conclusion_prompt_token = 650        \n",
        "        text_token = len(self.encoding.encode(text))\n",
        "        clip_text_index = int(len(text)*(self.max_token_num-conclusion_prompt_token)/text_token)\n",
        "        clip_text = text[:clip_text_index]   \n",
        "        \n",
        "        messages=[\n",
        "                {\"role\": \"system\", \"content\": \"You are a reviewer in the field of [\"+self.key_word+\"] and you need to critically review this article\"},  # chatgpt 角色\n",
        "                {\"role\": \"assistant\", \"content\": \"This is the <summary> and <conclusion> part of an English literature, where <summary> you have already summarized, but <conclusion> part, I need your help to summarize the following questions:\"+clip_text},  # 背景知识，可以参考OpenReview的审稿流程\n",
        "                {\"role\": \"user\", \"content\": \"\"\"                 \n",
        "                 8. Make the following summary.Be sure to use {} answers (proper nouns need to be marked in English).\n",
        "                    - (1):What is the significance of this piece of work?\n",
        "                    - (2):Summarize the strengths and weaknesses of this article in three dimensions: innovation point, performance, and workload.                   \n",
        "                    .......\n",
        "                 Follow the format of the output later: \n",
        "                 8. Conclusion: \\n\\n\n",
        "                    - (1):xxx;\\n                     \n",
        "                    - (2):Innovation point: xxx; Performance: xxx; Workload: xxx;\\n                      \n",
        "                 \n",
        "                 Be sure to use {} answers (proper nouns need to be marked in English), statements as concise and academic as possible, do not repeat the content of the previous <summary>, the value of the use of the original numbers, be sure to strictly follow the format, the corresponding content output to xxx, in accordance with \\n line feed, ....... means fill in according to the actual requirements, if not, you can not write.                 \n",
        "                 \"\"\".format(self.language, self.language)},\n",
        "            ]\n",
        "        response = openai.ChatCompletion.create(\n",
        "            model=\"gpt-3.5-turbo\",\n",
        "            # prompt需要用英语替换，少占用token。\n",
        "            messages=messages,\n",
        "        )\n",
        "        result = ''\n",
        "        for choice in response.choices:\n",
        "            result += choice.message.content\n",
        "        print(\"conclusion_result:\\n\", result)\n",
        "        print(\"prompt_token_used:\", response.usage.prompt_tokens,\n",
        "              \"completion_token_used:\", response.usage.completion_tokens,\n",
        "              \"total_token_used:\", response.usage.total_tokens)\n",
        "        print(\"response_time:\", response.response_ms/1000.0, 's')             \n",
        "        return result            \n",
        "    \n",
        "    @tenacity.retry(wait=tenacity.wait_exponential(multiplier=1, min=4, max=10),\n",
        "                    stop=tenacity.stop_after_attempt(5),\n",
        "                    reraise=True)\n",
        "    def chat_method(self, text):\n",
        "        openai.api_key = self.chat_api_list[self.cur_api]\n",
        "        self.cur_api += 1\n",
        "        self.cur_api = 0 if self.cur_api >= len(self.chat_api_list)-1 else self.cur_api\n",
        "        method_prompt_token = 650        \n",
        "        text_token = len(self.encoding.encode(text))\n",
        "        clip_text_index = int(len(text)*(self.max_token_num-method_prompt_token)/text_token)\n",
        "        clip_text = text[:clip_text_index]        \n",
        "        messages=[\n",
        "                {\"role\": \"system\", \"content\": \"You are a researcher in the field of [\"+self.key_word+\"] who is good at summarizing papers using concise statements\"},  # chatgpt 角色\n",
        "                {\"role\": \"assistant\", \"content\": \"This is the <summary> and <Method> part of an English document, where <summary> you have summarized, but the <Methods> part, I need your help to read and summarize the following questions.\"+clip_text},  # 背景知识\n",
        "                {\"role\": \"user\", \"content\": \"\"\"                 \n",
        "                 7. Describe in detail the methodological idea of this article. Be sure to use {} answers (proper nouns need to be marked in English). For example, its steps are.\n",
        "                    - (1):...\n",
        "                    - (2):...\n",
        "                    - (3):...\n",
        "                    - .......\n",
        "                 Follow the format of the output that follows: \n",
        "                 7. Methods: \\n\\n\n",
        "                    - (1):xxx;\\n \n",
        "                    - (2):xxx;\\n \n",
        "                    - (3):xxx;\\n  \n",
        "                    ....... \\n\\n     \n",
        "                 \n",
        "                 Be sure to use {} answers (proper nouns need to be marked in English), statements as concise and academic as possible, do not repeat the content of the previous <summary>, the value of the use of the original numbers, be sure to strictly follow the format, the corresponding content output to xxx, in accordance with \\n line feed, ....... means fill in according to the actual requirements, if not, you can not write.                 \n",
        "                 \"\"\".format(self.language, self.language)},\n",
        "            ]\n",
        "        response = openai.ChatCompletion.create(\n",
        "            model=\"gpt-3.5-turbo\",\n",
        "            messages=messages,\n",
        "        )\n",
        "        result = ''\n",
        "        for choice in response.choices:\n",
        "            result += choice.message.content\n",
        "        print(\"method_result:\\n\", result)\n",
        "        print(\"prompt_token_used:\", response.usage.prompt_tokens,\n",
        "              \"completion_token_used:\", response.usage.completion_tokens,\n",
        "              \"total_token_used:\", response.usage.total_tokens)\n",
        "        print(\"response_time:\", response.response_ms/1000.0, 's') \n",
        "        return result\n",
        "    \n",
        "    @tenacity.retry(wait=tenacity.wait_exponential(multiplier=1, min=4, max=10),\n",
        "                    stop=tenacity.stop_after_attempt(5),\n",
        "                    reraise=True)\n",
        "    def chat_summary(self, text):\n",
        "        openai.api_key = self.chat_api_list[self.cur_api]\n",
        "        self.cur_api += 1\n",
        "        self.cur_api = 0 if self.cur_api >= len(self.chat_api_list)-1 else self.cur_api\n",
        "        summary_prompt_token = 1000        \n",
        "        text_token = len(self.encoding.encode(text))\n",
        "        clip_text_index = int(len(text)*(self.max_token_num-summary_prompt_token)/text_token)\n",
        "        clip_text = text[:clip_text_index]\n",
        "        messages=[\n",
        "                {\"role\": \"system\", \"content\": \"You are a researcher in the field of [\"+self.key_word+\"] who is good at summarizing papers using concise statements\"},\n",
        "                {\"role\": \"assistant\", \"content\": \"This is the title, author, link, abstract and introduction of an English document. I need your help to read and summarize the following questions: \"+clip_text},\n",
        "                {\"role\": \"user\", \"content\": \"\"\"                 \n",
        "                 1. Mark the title of the paper (with Chinese translation)\n",
        "                 2. list all the authors' names (use English)\n",
        "                 3. mark the first author's affiliation (output {} translation only)                 \n",
        "                 4. mark the keywords of this article (use English)\n",
        "                 5. link to the paper, Github code link (if available, fill in Github:None if not)\n",
        "                 6. summarize according to the following four points.Be sure to use {} answers (proper nouns need to be marked in English)\n",
        "                    - (1):What is the research background of this article?\n",
        "                    - (2):What are the past methods? What are the problems with them? Is the approach well motivated?\n",
        "                    - (3):What is the research methodology proposed in this paper?\n",
        "                    - (4):On what task and what performance is achieved by the methods in this paper? Can the performance support their goals?\n",
        "                 Follow the format of the output that follows:                  \n",
        "                 1. Title: xxx\\n\\n\n",
        "                 2. Authors: xxx\\n\\n\n",
        "                 3. Affiliation: xxx\\n\\n                 \n",
        "                 4. Keywords: xxx\\n\\n   \n",
        "                 5. Urls: xxx or xxx , xxx \\n\\n      \n",
        "                 6. Summary: \\n\\n\n",
        "                    - (1):xxx;\\n \n",
        "                    - (2):xxx;\\n \n",
        "                    - (3):xxx;\\n  \n",
        "                    - (4):xxx.\\n\\n     \n",
        "                 \n",
        "                 Be sure to use {} answers (proper nouns need to be marked in English), statements as concise and academic as possible, do not have too much repetitive information, numerical values using the original numbers, be sure to strictly follow the format, the corresponding content output to xxx, in accordance with \\n line feed.                 \n",
        "                 \"\"\".format(self.language, self.language, self.language)},\n",
        "            ]\n",
        "                \n",
        "        response = openai.ChatCompletion.create(\n",
        "            model=\"gpt-3.5-turbo\",\n",
        "            messages=messages,\n",
        "        )\n",
        "        result = ''\n",
        "        for choice in response.choices:\n",
        "            result += choice.message.content\n",
        "        print(\"summary_result:\\n\", result)\n",
        "        print(\"prompt_token_used:\", response.usage.prompt_tokens,\n",
        "              \"completion_token_used:\", response.usage.completion_tokens,\n",
        "              \"total_token_used:\", response.usage.total_tokens)\n",
        "        print(\"response_time:\", response.response_ms/1000.0, 's')                    \n",
        "        return result        \n",
        "                        \n",
        "    def export_to_markdown(self, text, file_name, mode='w'):\n",
        "        # 使用markdown模块的convert方法，将文本转换为html格式\n",
        "        # html = markdown.markdown(text)\n",
        "        # 打开一个文件，以写入模式\n",
        "        with open(file_name, mode, encoding=\"utf-8\") as f:\n",
        "            # 将html格式的内容写入文件\n",
        "            f.write(text)        \n",
        "\n",
        "    # 定义一个方法，打印出读者信息\n",
        "    def show_info(self):        \n",
        "        print(f\"Key word: {self.key_word}\")\n",
        "        print(f\"Query: {self.query}\")\n",
        "        print(f\"Sort: {self.sort}\")                \n",
        "\n",
        "def main(args):       \n",
        "    # 创建一个Reader对象，并调用show_info方法\n",
        "    if args.sort == 'Relevance':\n",
        "        sort = arxiv.SortCriterion.Relevance\n",
        "    elif args.sort == 'LastUpdatedDate':\n",
        "        sort = arxiv.SortCriterion.LastUpdatedDate\n",
        "    else:\n",
        "        sort = arxiv.SortCriterion.Relevance\n",
        "        \n",
        "    if args.pdf_path:\n",
        "        reader1 = Reader(key_word=args.key_word, \n",
        "                query=args.query, \n",
        "                filter_keys=args.filter_keys,                                    \n",
        "                sort=sort, \n",
        "                args=args\n",
        "                )\n",
        "        reader1.show_info()\n",
        "        # 开始判断是路径还是文件：   \n",
        "        paper_list = []     \n",
        "        if args.pdf_path.endswith(\".pdf\"):\n",
        "            paper_list.append(Paper(path=args.pdf_path))            \n",
        "        else:\n",
        "            for root, dirs, files in os.walk(args.pdf_path):\n",
        "                print(\"root:\", root, \"dirs:\", dirs, 'files:', files) #当前目录路径\n",
        "                for filename in files:\n",
        "                    # 如果找到PDF文件，则将其复制到目标文件夹中\n",
        "                    if filename.endswith(\".pdf\"):\n",
        "                        paper_list.append(Paper(path=os.path.join(root, filename)))        \n",
        "        print(\"------------------paper_num: {}------------------\".format(len(paper_list)))        \n",
        "        [print(paper_index, paper_name.path.split('\\\\')[-1]) for paper_index, paper_name in enumerate(paper_list)]\n",
        "        reader1.summary_with_chat(paper_list=paper_list)\n",
        "    else:\n",
        "        reader1 = Reader(key_word=args.key_word, \n",
        "                query=args.query, \n",
        "                filter_keys=args.filter_keys,                                    \n",
        "                sort=sort, \n",
        "                args=args\n",
        "                )\n",
        "        reader1.show_info()\n",
        "        filter_results = reader1.filter_arxiv(max_results=args.max_results)\n",
        "        paper_list = reader1.download_pdf(filter_results)\n",
        "        reader1.summary_with_chat(paper_list=paper_list)\n",
        "    \n",
        "    \n",
        "if __name__ == '__main__':    \n",
        "    parser = argparse.ArgumentParser()    \n",
        "    parser.add_argument(\"--pdf_path\", type=str, default='', help=\"if none, the bot will download from arxiv with query\")\n",
        "    parser.add_argument(\"--query\", type=str, default='all: reinforcement learning', help=\"the query string, ti: xx, au: xx, all: xx,\")\n",
        "    parser.add_argument(\"--key_word\", type=str, default='deep reinforcement learning', help=\"the key word of user research fields\")\n",
        "    parser.add_argument(\"--filter_keys\", type=str, default='reinforcement learning', help=\"the filter key words, 摘要中每个单词都得有，才会被筛选为目标论文\")\n",
        "    parser.add_argument(\"--max_results\", type=int, default=2, help=\"the maximum number of results\")\n",
        "    parser.add_argument(\"--sort\", type=str, default=\"Relevance\", help=\"another is LastUpdatedDate, and Relevance\")    \n",
        "    parser.add_argument(\"--save_image\", default=False, help=\"save image? It takes a minute or two to save a picture! But pretty\")\n",
        "    parser.add_argument(\"--file_format\", type=str, default='md', help=\"导出的文件格式，如果存图片的话，最好是md，如果不是的话，txt的不会乱\")\n",
        "    parser.add_argument(\"--language\", type=str, default='zh', help=\"The other output lauguage is English, is en\")\n",
        "    parser.add_argument(\"--api_key\", type=str, default='sk-xxxxxxxxxxxx', help=\"your openai api key!\")\n",
        "    parser.add_argument('-f')\n",
        "    args = parser.parse_args()\n",
        "    import time\n",
        "    start_time = time.time()\n",
        "    main(args=args)    \n",
        "    print(\"summary time:\", time.time() - start_time)\n",
        "    "
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "Q_tN5aM0nXfm",
        "outputId": "04196e2a-f365-4d4b-d131-6f831e58f821"
      },
      "execution_count": 5,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Key word: deep reinforcement learning\n",
            "Query: all: reinforcement learning\n",
            "Sort: SortCriterion.Relevance\n",
            "all search:\n",
            "0 Some Insights into Lifelong Reinforcement Learning Systems 2020-01-27 07:26:12+00:00\n",
            "abs_text: A lifelong reinforcement learning system is a learning system that has the ability to learn through trail-and-error interaction with the environment over its lifetime. In this paper, I give some arguments to show that the traditional reinforcement learning paradigm fails to model this type of learning system. Some insights into lifelong reinforcement learning are provided, along with a simplistic prototype lifelong reinforcement learning system.\n",
            "------------------------------\n",
            "1 Deep Reinforcement Learning in Computer Vision: A Comprehensive Survey 2021-08-25 23:01:48+00:00\n",
            "abs_text: Deep reinforcement learning augments the reinforcement learning framework and utilizes the powerful representation of deep neural networks. Recent works have demonstrated the remarkable successes of deep reinforcement learning in various domains including finance, medicine, healthcare, video games, robotics, and computer vision. In this work, we provide a detailed review of recent and state-of-the-art research advances of deep reinforcement learning in computer vision. We start with comprehending the theories of deep learning, reinforcement learning, and deep reinforcement learning. We then propose a categorization of deep reinforcement learning methodologies and discuss their advantages and limitations. In particular, we divide deep reinforcement learning into seven main categories according to their applications in computer vision, i.e. (i)landmark localization (ii) object detection; (iii) object tracking; (iv) registration on both 2D image and 3D image volumetric data (v) image segmentation; (vi) videos analysis; and (vii) other applications. Each of these categories is further analyzed with reinforcement learning techniques, network design, and performance. Moreover, we provide a comprehensive analysis of the existing publicly available datasets and examine source code availability. Finally, we present some open issues and discuss future research directions on deep reinforcement learning in computer vision\n",
            "------------------------------\n",
            "filter_keys: reinforcement learning\n",
            "筛选后剩下的论文数量：\n",
            "filter_results: 2\n",
            "filter_papers:\n",
            "0 Some Insights into Lifelong Reinforcement Learning Systems 2020-01-27 07:26:12+00:00\n",
            "1 Deep Reinforcement Learning in Computer Vision: A Comprehensive Survey 2021-08-25 23:01:48+00:00\n",
            "All_paper: 2\n",
            "paper_path: ./pdf_files/all  reinforcement learni-2023-03-16-05/Some Insights into Lifelong Reinforcement Learning Systems.pdf\n",
            "section_page_dict {'Abstract': 0, 'Introduction': 0, 'Experiment': 5, 'Results': 6, 'References': 7}\n",
            "0 Abstract 0\n",
            "1 Introduction 0\n",
            "start_page, end_page: 0 5\n",
            "2 Experiment 5\n",
            "start_page, end_page: 5 6\n",
            "3 Results 6\n",
            "start_page, end_page: 6 7\n",
            "4 References 7\n",
            "start_page, end_page: 7 9\n",
            "paper_path: ./pdf_files/all  reinforcement learni-2023-03-16-05/Deep Reinforcement Learning in Computer Vision_ A Comprehensive Survey.pdf\n",
            "section_page_dict {'Abstract': 0, 'Introduction': 0, 'Approaches': 56, 'Discussion': 61, 'Conclusion': 64, 'References': 65}\n",
            "0 Abstract 0\n",
            "1 Introduction 0\n",
            "start_page, end_page: 0 56\n",
            "2 Approaches 56\n",
            "start_page, end_page: 56 61\n",
            "3 Discussion 61\n",
            "start_page, end_page: 61 64\n",
            "4 Conclusion 64\n",
            "start_page, end_page: 64 65\n",
            "5 References 65\n",
            "start_page, end_page: 65 103\n",
            "summary_result:\n",
            " \n",
            "\n",
            "1. Title: Some Insights into Lifelong Reinforcement Learning Systems (一些关于终身强化学习系统的洞见)\n",
            "\n",
            "2. Authors: Changjian Li\n",
            "\n",
            "3. Affiliation: Department of Electrical and Computer Engineering, University of Waterloo, Canada. Correspondence to: Changjian Li <changjian.li@uwaterloo.ca>.\n",
            "\n",
            "4. Keywords: Lifelong reinforcement learning, scalar reward reinforcement learning, Q-learning, environment\n",
            "\n",
            "5. Urls: http://arxiv.org/abs/2001.09608v1\n",
            "\n",
            "6. Summary:\n",
            "\n",
            "- (1):本文探讨的是终身强化学习系统，即具有终身学习能力的学习系统。作者主张传统的强化学习范式无法很好地模拟这种学习方式。\n",
            "- (2):传统的强化学习范式注重学习器不同世代之间的学习调整，而不是学习器自身的学习与调整。对于一个学习器而言，获得的累积奖励只有在生命期末才能被测量。因此，这种范式适用于跨代（agent）强化学习，但不适用于终身强化学习。这种现象引发了对终身强化学习系统的进一步研究，同时也为本文探讨的提出提供了动力。\n",
            "- (3):针对常规强化学习的 Q-learning 算法，本文提出了终身强化学习的思路，并设计了一个原型系统。在该系统中，传承历代的 Q 值估计变为协助强化学习的关键信号之一，并且在实现上，学习机制被设定在生成式模型内部。\n",
            "- (4):本文提出的深度 Q 网络的终身强化学习原型系统在 Atari 游戏中进行了测试，相比于传统强化学习算法，本文算法表现出色且鲁棒性良好。\n",
            "prompt_token_used: 3401 completion_token_used: 556 total_token_used: 3957\n",
            "response_time: 40.345 s\n",
            "conclusion_result:\n",
            " 8. Conclusion: \n",
            "\n",
            "- (1): 本文章的意义在于提出了一种具有终身学习能力的学习系统--终身强化学习系统，并在 Atari 游戏中进行了测试，证明其具有出色的表现和较强的鲁棒性。\n",
            "- (2): 创新点：本文提出了基于终身学习的 Q-learning 算法，为终身强化学习系统的设计提供了一种新的思路； 性能：通过在 Atari 游戏的测试中，显示出本文算法的优异表现；工作量：本文所涉及的算法、模型设计、参数调整等方面都给出了详细的说明，对后续研究提供了借鉴。\n",
            "prompt_token_used: 850 completion_token_used: 216 total_token_used: 1066\n",
            "response_time: 41.321 s\n",
            "summary_result:\n",
            " 1. Title: Deep Reinforcement Learning in Computer Vision: A Comprehensive Survey (计算机视觉中的深度强化学习综述)\n",
            "2. Authors: Ngan Le, Vidhiwar Singh Rathour, Kashu Yamazaki, Khoa Luu, Marios Savvides\n",
            "3. Affiliation: Ngan Le隶属于卡内基梅隆大学, Vidhiwar Singh Rathour, Kashu Yamazaki, Khoa Luu, Marios Savvides隶属于史蒂文斯理工学院 (Ngan Le belongs to Carnegie Mellon University, Vidhiwar Singh Rathour, Kashu Yamazaki, Khoa Luu, Marios Savvides belong to Stevens Institute of Technology)\n",
            "4. Keywords: Deep reinforcement learning, computer vision, object detection, image segmentation, landmark localization, object tracking, image registration, videos analysis (深度强化学习，计算机视觉，目标检测，图像分割，标志定位，目标跟踪，图像配准，视频分析)\n",
            "5. Urls: Paper: http://arxiv.org/abs/2108.11510v1. Github: None.\n",
            "6. Summary: \n",
            "\n",
            "   - (1):本文介绍了深度强化学习在计算机视觉领域的最新研究进展，旨在为读者提供关于RL/DRL原理的知识和如何使用DRL解决计算机视觉任务的最新实例的全面资料。 \n",
            "   - (2):相比先前的工作，作者分类并讨论了深度强化学习的代表性应用，包括标志定位，目标检测，图像分割等，并详细比较了现有算法的优缺点。同时，本文为读者研究深度强化学习的开放问题和未来研究方向提供了建议。 \n",
            "   - (3):作者介绍了深度学习、强化学习和深度强化学习的理论知识，详细阐述了深度强化学习的主要技巧，并分类讨论了模型基于和模型无关的RL。在模型基于和模型无关的RL模型下，本文重点介绍了价值函数方法、策略梯度方法、演员-评论家方法的主要技术。  \n",
            "   - (4):本文讨论了深度强化学习在计算机视觉应用中的表现，包括标志定位、目标检测、目标跟踪、图像配准、对象分割、视频分析和其他应用。每个应用类别首先介绍了问题，并详细讨论了该领域中的最新方法和性能。结果表明，相比传统机器学习和其他深度学习方法，深度强化学习的MLP和CNN模型在计算机视觉领域中的各类任务中均取得了优异的结果，取得了很好的性能。\n",
            "prompt_token_used: 3259 completion_token_used: 811 total_token_used: 4070\n",
            "response_time: 56.768 s\n",
            "method_result:\n",
            " 7. Methods: \n",
            "\n",
            "- (1): 本文介绍了深度强化学习在计算机视觉领域的代表性应用，分类讨论了模型基于和模型无关的RL，并详细比较了现有算法的优缺点。本文还对每个应用领域的问题进行了介绍，并讨论了最新方法和性能，以表达出深度强化学习在计算机视觉中所取得的成就。\n",
            "\n",
            "- (2): 对于模型基于的RL算法，本文详细介绍了价值函数方法、策略梯度方法、演员-评论家方法等主要技术，并分类讨论了应用和性能，介绍了模型基于和模型无关的RL学习中的最新算法和性能实例。\n",
            "\n",
            "- (3): 本文使用公认的计算机视觉基准数据集对使用深度强化学习的方法进行实验，如COCO对象检测数据集、PASCAL VOC对象检测数据集、ImageNet数据集等。为评估各种方法的性能，本文将比较重点放在基于深度强化学习的方法和深度学习标准方法之间的比较。 \n",
            "\n",
            "- (4): 讨论了未来几年内深度强化学习面临的挑战和可能解决的问题，如在远程传感器阵列中进行目标跟踪的新方法、RL与元学习结合的新范例和评估强化学习算法健壮性的更好方法。本文也提到了仍需解决的挑战，如通用性和泛化能力的问题、采样机制和计算效率、证明收敛性和长期稳定性的问题等。\n",
            "prompt_token_used: 3259 completion_token_used: 509 total_token_used: 3768\n",
            "response_time: 39.082 s\n",
            "conclusion_result:\n",
            " 8. Conclusion: \n",
            "- (1): 本文对深度强化学习在计算机视觉领域应用的最新研究进展进行了全面综述，为研究深度强化学习及其应用的学者提供了重要的参考资料，具有重要的学术价值。\n",
            "- (2): 创新点：本文针对深度强化学习在计算机视觉领域的应用进行了全面而深入的综述，尤其是对模型基于和模型无关的RL模型下的价值函数方法、策略梯度方法、演员-评论家方法等主要技术进行了分类讨论和详细介绍，并使用公认的计算机视觉基准数据集对使用深度强化学习的方法进行了实验。性能：本文对深度强化学习在计算机视觉各个应用领域的优缺点进行了全面阐述，并对深度强化学习在计算机视觉领域中的各类任务中的表现进行了分析和比较。工作量：本文工作量较大，涉及到深度强化学习、计算机视觉、MLP、CNN、RNN、模型基于RL、模型无关的RL、多种计算机视觉数据集及其数据评价指标等内容，需要具备一定的专业知识。\n",
            "prompt_token_used: 1894 completion_token_used: 414 total_token_used: 2308\n",
            "response_time: 32.787 s\n",
            "summary time: 217.2659273147583\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [],
      "metadata": {
        "id": "6Aasm9hi6nYQ"
      },
      "execution_count": null,
      "outputs": []
    }
  ]
}